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Histological Assessment of Pleomorphism 

This invention relates to a method, an apparatus and a computer program for 
histological assessment of pleomorphism: it is particularly relevant (although not 
exclusively) to assessment of histological slides to provide clinical information on 
5 potentially cancerous tissue such as breast cancer tissue. The method is also relevant to 
colon and cervical cancer as well as breast cancer. 

Breast cancer is a common form of female cancer: once a lesion indicative of breast 
cancer has been detected, tissue samples are taken and examined by a histopathologist 
to establish a diagnosis, prognosis and treatment plan. However, pathological analysis 

10 of tissue samples is a time consuming and inaccurate process. It entails interpretation of 
images by human eye, which is highly subjective: it is characterised by considerable 
inaccuracies in observations of the same samples by different observers and even by 
the same observer at different times. For example, two different observers assessing the 
same tissue samples may easily give different opinions for a number of the slides. This 

15 difference may be as high as 30%. The problem is exacerbated by heterogeneity, i.e. 
complexity of some tissue sample features. 

There is a need to provide an objective measurement of the degree of pleomorphism to 
support the pathologist's diagnosis and the patient's treatment. 

The present invention provide a method of histological assessment of pleomorphism by 
20" identifying image regions potentially corresponding to cells in histological image data, 
characterised in that the method also includes determining perimeters and areas of 
identified image regions, calculating region shape factors from the perimeters and areas 
and determining an assessment of pleomorphism from the shape factors. 

The invention provides the advantage that it provides an objective measurement of 
25 mitotic activity to inform a pathologist's diagnosis and patient treatment. 

The shape factors are preferably rational functions of perimeters and areas. Each shape 
factor may be equaj or proportional to square of perimeter divided by area-; 




The step of determining an assessment of pleomorphism may include determining a 
mean or median of the shape factors. It may include thresholding the mean or median of 
the shape factors; It may include determining pleomorphism as being relatively low, 
moderate or high according to whether the mean or median of the shape factors is 
5 relatively low, moderate or high. 



The step of identifying image regions may include filtering the image data to overwrite 
- regions which are not of interest using a. filtering process which does not appreciably 
affect image region perimeter. Overwriting regions may include setting, relatively small 
image regions to a background pixel value and setting hole pixels in relatively larger 
10 image regions to a non-hole image region pixel value. 

The step of identifying image regions may include using principal component analysis to 
derive monochromatic image data. It may include dividing the image data into 
overlapping sub-images, removing from the sub-images: 
a) image regions touching or intersecting sub-image boundaries, 
15 b) unsuitably small image regions, and 

c) holes in relatively large image regions, and 

reassembling the sub-images. 

In another aspect, the present invention provides computer apparatus for histological 
assessment of pleomorphism by identification of image regions potentially corresponding 
20 to cells in histological image data, characterised in that the apparatus is programmed to 
execute the steps of determining perimeters and areas of identified image regions, 
calculating region shape factors from the perimeters and areas and determining an 
assessment of pleomorphism from the shape factors. 

In a further aspect, the present invention provides a computer program for use in 
25 histological assessment of pleomorphism from identification, of image regions potentially 
corresponding to cells in histological image data, characterised in that the program 
contains instructions to control computer apparatus to determine perimeters and areas 
of identified image regions, calculate region shape factors from the perimeters and 
areas and determine an assessment of pleomorphism from the shape factors. 

30 The computer apparatus and computer program aspects of the invention may have 



.3 

preferred features equivalent to corresponding method aspects of the invention. 

In order that the invention might be more fully understood, embodiments thereof will now 
be described, by way of example only, with reference to the accompanying drawings, in 
which:- 

5 Figure 1 is a block diagram of a procedure for measuring pleomorphism to assist in 
formulating diagnosis and treatment; and 

Figure 2 is a block diagram showing in more detail pleomorphism feature detection 
which is part of the procedure of Figure 1 . 

A procedure 10 for the assessment of tissue samples in the form of histopathological 
10 slides of potential carcinomas of the breast is shown in Figure 1 . This drawing illustrates 
- processes which measure degree of pleomorphism for use in assessment of patient 
condition. 

The procedure 10 employs a database 12, which maintains digitised image data 
obtained from histological slides as will be described later. Sections are taken (cut) from 
15 breast tissue samples (biopsies) and placed on respective slides. Slides are stained 
using the staining agent Haemotoxylin & Eosin (H&E). H&E is a very common stain for 
delineating tissue and cellular structure. Tissue stained with H&E is used to assess 
pleomorphism. 

Pleomorphism is a measurement of degree of cell shape variability within a tissue 
20 sample. In normal tissue samples cell nuclei have a regular structure in terms of shape 
and size, whereas in cancerous tissue nuclei can become larger and irregularly shaped, 
with a marked variation in shape and size. 

In a prior art manual procedure; a clinician places a slide under ai microscope with 
magnification of 40X and examines a region of it (often referred to as a tile) for 
25 indications of the degree of pleomorphism. This manual procedure requires a pathologist 
subjectively to assess unusual size and shape of cells in a tissue sample. The values 
obtained in this way are combined to give a single measurement for use in diagnosis. 
The process of the invention replaces the prior art subjective procedure with an objective 
procedure. 



• 



• 
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In the present example, image data were obtained from histological slides by a 
pathologist using Zeiss Axioskop microscope with a Jenoptiks Progres 3012 digital 
camera. Image data from each slide is a set of digital images obtained at a linear 
magnification of 40 (i.e. 40X linear, 1600X area), each image being an electronic 
5 equivalent of a tile. 

To obtain images, a pathologist scans the microscope over a slide, and at 40X 
magnification selects regions of the slide which appear to be most promising in terms of 
pleomorphism assessment. Each of these regions is then photographed using the 
microscope and camera mentioned above, and this produces for each region a 

10 respective digitised image in three colours, red, green and blue (R, G & B). Three 
Intensity values are obtained for each pixel in a pixel array to provide an image as a 
combination of R, G and B image planes. This image is stored temporarily at 12 for later 
use. Two tiles are required for pleomorphism measurement at 14 by a process 16: the 
results of the process 16 are converted into a measurement at 20 for input to a 

15 diagnostic report at 22. 

Referring now to Figure 2, the pleomorphism feature detection process 1 6 is shown in 
more detail. It is carried out for each of the two tiles or digitised images (raw (RGB) input 
images) mentioned above, and will be described for one such image. At. a first stage 30 
the raw (RGB) input image is separated into overlapping windows of size 128x128 
20 pixels. In each window, 64 pixels overlap with 64 pixels of respective neighbouring 
windows in both horizontal and vertical directions. For example, an image of 256x256 
would give a 3x3 set of 128x128 overlapping windows: thus, each window overlaps half 
of each of its row and column neighbours. To each window a technique referred to as 
"Principal Component Analysis" (PCA, Karhunen-Loeve Transform) is applied. PCA is a 
25 standard mathematical technique described by Jollie IT., 'Principal Component 
Analysis 1 , Springer series in statistics, Springer Verlag, 1986. It is also described by 
Jackson J.E., 'A user guide to Principal components', John Wiley & Sons, 1991 , pp 1-25. 
• PCA is a standard mathematical technique for transforming a set of (possibly) correlated 
variables into a smaller number of uncorrected variables called principal components. 
30 Of the principal components, a first principal component accounts for as much of the 
variables 1 variability as possible compared to other principal components. PCA involves 
calculating a covariance matrix and solving for its eigenvalues and eigenvectors. The 
image is now treated as being arranged as an Nx3 matrix, i.e. having N pixels and 3 




planes (R, G and B). The covariance matrix is calculated using a formula for its matrix 
elements dj as follows: 

* ( \ 



where C u is the covariance of Variable / with variable j, x k and y k are the ith and jth 
5 feature values of the kth object, & is the mean of all N values of x k , jUy is the mean of all 
N values of y k . The covariance matrix is 3x3 and PCA yields three eigenvectors: the 
eigenvectors are treated as a 3x3 matrix, which is used to multiply the transpose of the 
Nx3 image matrix to produce a product matrix. The product matrix has an Nx1 first 
column which is the first principal component, which may be considered as the most 

10 important component: it is. the component with the maximum eigenvalue, and it provides 
a greyscale sub-image (one pixel value for each of N pixels) with a maximum range of 
information compared to equivalents associated with other components. PCA therefore 
• produces monochromatic image data. It is carried out for each of the overlapping 
windows defined above, and each provides a respective first principal component and 

15 greyscale sub-image of size 1 28x1 28 pixels. 

At 32, a thresholding technique referred to as "Otsu" is applied to each sub-image 
resulting from 30 to transform it into a respective binary sub-image. Otsu is a standard 
thresholding technique published by Otsu N., 'A thresholding selection method from grey 

20. level histograms', IEEE Trans Systems, Man & Cybernetics, vol. 9, 1979, pp 62-66. The 
Otsu threshold selection method aims to minimise for two classes a ratio of between- 
class variance to within-class variance: i.e. the higher the variance between classes the 
better the separation. In the present example the two classes are a below-threshold 
class (pixel value 0) and an above-threshold class (pixel value 1), so Otsu thresholding 

25 converts each greyscale sub-image to a binary sub-image containing a set of blobs: here 
blobs are image regions (objects in the image) each of which is a respective group of 
contiguous pixels all having value 1. The blobs may have holes ( pixel value 0) in them. 

At 34 all blobs that touch or are intersected by sub-image boundaries are removed. 
Thus, if at any pixel a blob meets a border it is removed by setting its pixels to a 
30 background pixel value. This is because boundaries meeting blobs produce artificially 
straight blob edges which can give misleading results later. Because of sub-image 




overlap, a blob which appears partly in one image may appear wholly in another sub- 
image. 

At 36 the sub-images from 34 are inverted (pixel value 0 changes to 1 and vice versa) 
and connected component labelling (CCL) is applied to remove holes in blobs. CCL is a 

5 known image processing technique (sometimes referred to as 'blob colouring 1 ) published 
by Klette R., Zamperoniu P., 'Handbook of Image Processing Operators*, John Wiley & 
Sons, 1996, and Rosenfeld A., Kak A.C., 'Digital Picture Processing', vols. 1 & 2, 
Academic Press, New York, 1982. CCL gives a respective label to each group of 
contiguous pixels of pixel value 1 . Because of the inversion, areas of pixel value 1 which 

10 are labelled in CCL are now holes within blobs together with background pixels. Holes 
within each blob are now removed (filled) by setting their pixels to the value of other 
- pixels of the blob. Background pixels are left unchanged. 

At 38, the sub-images from 36 are inverted once more and CCL is applied again: due to 
this second inversion, areas labelled by CCL are now filled blobs within each sub-image. 

15 CCL has a facility for removal of blobs with areas smaller than a user-defined minimum 
area threshold. In this example the minimum area threshold is 400 pixels: all blobs with 
areas less than 400 pixels are rejected and merged into the background by setting their 
pixels to a background value (0). Remaining blobs with area of at least 400 pixels are 
accepted for further processing as set of labelled blobs. CCL also gives for each 

20 remaining blob its perimeter P and area A in numbers of pixels in each case. 

After stages 30 to 38, each sub-image is clear from small unwanted blobs: all remaining 
blobs have been filled to remove holes within them, so they consist of pixels which are 
all the same value. The advantage of stages 30 to 38 is that they provide spatial filtering 
but do not appreciably affect perimeter shapes of blobs, which is important for 
25 subsequent processing. Such filtering is not essential but it is helpful to reduce 
processing burden 

At 40, the sub-images output from step 38 are reassembled into a new binary image: the 
new binary, image has the same size as the original raw (RGB) input image, and 
contains only blobs that will be assessed in subsequent pleomorphism processing. 

30 Steps 30 to 40 are pre-processing steps which identify a set of blobs within the original 
raw (RGB) input image: each of these blobs should correspond to a cell. At 42 a 
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statistical analysis is applied. A shape factor S is calculated for each blob from 
blob perimeter P and area A obtained in CCL at 38. S could also be a multiple or 
fraction of — were this to be convenient, and other functions of P and A could be used 
to indicate shape. The value of S increases with increasing irregularity of blob shape, it 
is 4n (-12.56) for a perfect circle. A mean value S m of S is calculated for all blobs in the 
image by adding their S values together and dividing by their number. S m is 
thresholded to derive a measure of the pleomorphism of the original raw (RGB) input 
image: S m thresholds were derived from an analysis of a test set of 200 pleomorphism 
images. There are, three threshold categories, S m Z 30 (low), 30<S m Z 35 (moderate) 
and S m > 35 (high) with pleomorphism scores 1, 2 and 3 respectively as tabulated 

below. These thresholds are for S=^-: other shape factor expressions would need 
different thresholds. 
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S m : Mean Value of Blob 
Shape Factor 


Pleomorphism 
Score 


Low, S m £30 


1 


Moderate, 30 <S m £ 35 . 


2 


High, S m > 35 


3 



S H may be calculated by other procedures. It might simply be the median of an odd 
number of values of S, or the average of two central values of an even number of . 
values of S . A weighted mean 5 WOT could be calculated by multiplying each value of S 
by a weight factor w„ adding the weighted values and dividing their sum by the sum of 
the weights; each weight would indicate the probability" of the associated blob or cell 
giving a reliable Indication of pleomorphism: i.e. 




At 20 the score for the pleomorphism measurement has a value 1, 2 or 3 and a 
respective score is obtained for each of the input tiles. The maximum of these two 
scores is taken as the overall pleomorphism score. 
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Measurement of Nuclear 
Pleomorphism 


Meaning 


Points 


Uniform 


Minimal increase in size and variation in 
size compared to normal cells, i.e. cells 
are relatively small and uniform in size. 


1 


Moderate Variation 


Moderate increase and variation in size 
and shape with vesicular nuclei. 


2 


Marked Variation 


Marked variation in size and shape with 
vesicular nuclei. 


3 



The measurement of pleomorphism may be combined with others obtained for mitosis 
and tubules by different methods to derive an overall grading referred to in medicine as 
a "Bloom and Richardson grading": it is used by clinicians as a measure of cancer 
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The examples given in the foregoing description for calculating results can clearly be 
evaluated by an appropriate computer program recorded on a carrier medium and 
running on a conventional computer system. Such a program is straightforward for a 
skilled programmer to implement without requiring invention because the procedures are 
15 well known, and will therefore not be described further. 



A method of histological assessment of pleomorphism by identifying image 
regions potentially corresponding to cells in histological image data, characterised 
in that the method also includes determining perimeters and areas of identified 
image regions, calculating region shape factors from the perimeters and areas and 
determining an assessment of pleomorphism from the shape factors. 

A method according to Claim 1 characterised in that the shape factors are rational 
functions of perimeters and areas. 

A method according to Claim 1 characterised in that each shape factor is equal or 
proportional to square of perimeter divided by area. 

A method iaccording to Claim 3 characterised in that the step of determining an 
assessment of pleomorphism includes determining a mean or median of the shape 
factors. 

A method according to Claim 4 characterised in that the step of determining an 
assessment of pleomorphism includes thresholding the mean or median of the 
shape factors. 

A method according to Claim 5 characterised in that the step of determining an 
assessment of pleomorphism includes determining pleomorphism as being 
relatively low, moderate or high according to whether the mean or median of the 
shape factors is relatively low, moderate or high. 

A method according to Claim 1 characterised in that the step of determining an 
assessment of pleomorphism includes determining a mean or median of the shape 
factors. 

A method according to Claim 1 characterised in that the step of identifying image 
regions potentially corresponding to cells in histological image data includes 
filtering the image data to overwrite regions which are not of interest using a 
filtering process which does not appreciably affect image region perimeter. 




9. A method according to Claim 8 characterised in that the step of overwriting regions 
which are not of interest includes setting relatively small image regions to a 
background pixel value and setting hole pixels in relatively larger image regions to 
a non-hole image region pixel value. 

10. A method according to Claim 1 characterised in that the step of identifying image 
regions potentially corresponding to cells in histological image data includes using 
principal component analysis to derive monochromatic image data. 

11. A method according to Claim 1 or 10 characterised in that the step of identifying 
image regions potentially corresponding to cells in histological image data includes 
dividing the image data into overlapping sub-images, removing from the sub- 
images: 

a) image regions touching or intersecting sub-image boundaries, 

b) unsuitably small image regions, and 

c) holes in relatively large image regions, and 
reassembling the sub-images. 

12. Computer apparatus for histological assessment of pleomorphism by identification 
of image regions potentially corresponding to cells in histological image data, 
characterised in that the apparatus is programmed to execute the steps of 
determining perimeters and areas of identified image regions, calculating region 
shape factors from the perimeters and areas and determining an assessment of 
pleomorphism from the shape factors. 

13. Apparatus according to Claim 12 characterised in that the shape factors are 
rational functions of perimeters and areas. 

14. Apparatus according to Claim 12 characterised in that each shape factor is equal 
or proportional to square of perimeter divided by area. 

15. Apparatus according to Claim 14 characterised in that the step of determining an 
assessment of pleomorphism includes determining a mean or median of the shape 
factors. 
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16. Apparatus according to Claim 15 characterised in that the step of determining an 
assessment of pleomorphism includes thresholding the mean or median of the 
shape factors. - 



17. 



18. 



19. 



20. 



21. 



22. 



Apparatus according to Claim 16 characterised in that the step of determining an 
assessment of pleomorphism includes determining pleomorphism as being 
relatively low, moderate or high according to whether the mean or median of the 
shape factors is relatively low, moderate or high. 

Apparatus according to Claim 12 characterised in that the step of determining an 
assessment of pleomorphism includes determining a mean or median of the shape 
factors. 

Apparatus according to Claim 12 characterised in that the step of identifying image 
regions potentially corresponding to cells in histological image data includes 
filtering the image data to overwrite regions which are not of interest using a 
filtering process which does not appreciably affect image region perimeter. 

Apparatus according to Claim 19 characterised in that the step of overwriting 
regions which are not of interest includes setting relatively small image regions to a 
background pixel value and setting hole pixels in relatively larger image regions to 
a non-hole image region pixel value. 

Apparatus according to Claim 12 characterised in that the step of identifying image 
regions potentially corresponding to cells in histological image data includes using 
principal component analysis to derive monochromatic image data. 

Apparatus according to Claim 12 or 21 characterised in that the step of identifying 
image regions potentially corresponding to cells in histological image data includes 
dividing the image data into overlapping sub-images, removing from the sub- 
images: 

a) image regions touching or intersecting sub-image boundaries, 

b) unsuitably small image regions, and 

c) holes in relatively large image regions, and 
reassembling the sub-images. 



24. 



25. 



26. 



27. 
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23. A computer program for use in histological assessment of pleomorphism from 
identification of image regions potentially corresponding to cells in histological 
image data, characterised in that the program contains instructions to control 
computer apparatus to determine perimeters and areas of identified image regions, 
calculate region shape factors from the perimeters and areas and determine an 
assessment of pleomorphism from the shape factors. 



A computer program according to Claim 23 characterised in that the shape factors 
are rational functions of perimeters and areas. 

A computer program according to Claim 23 characterised in that each shape factor 
is equal or proportional to square of perimeter divided by area. 

A computer program according to Claim 25 characterised in that the program 
instructions provide for determination of an assessment of pleomorphism using a 
mean or median of the shape factors. 

A computer program according to Claim 26 characterised in that the program 
instructions provide for determination of an assessment of pleomorphism using 
thresholding of the mean or median of the shape factors. 

28. A computer program according to Claim 27 characterised in that the program 
instructions provide for determination of pleomorphism as being relatively low, 
moderate or high according to whether the mean or median of the shape factors is 
relatively low, moderate or high. 

29. A computer program according to Claim 23 characterised in that the program 
instructions provide for determination of an assessment of pleomorphism using a 
mean or median of the shape factors. 

30. A computer program according to Claim 23 characterised in that the program 
instructions provide for identification of image regions potentially corresponding to 
cells in histological image data by a procedure which includes filtering the image 
data to overwrite regions which are not of interest using a filtering process which 
does not appreciably affect image region perimeter. 
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31. A computer program according to Claim 30 characterised in that the program 
instructions provide for overwriting regions which are not of interest by a procedure 
which includes setting relatively small image regions to a background pixel value 
and setting hole pixels in relatively larger image regions to a non-hole image region 
pixel value. 

32. A computer program according to Claim 23 characterised in that the program 
instructions provide for identification of image regions potentially corresponding to 
cells in histological image data by a procedure which includes principal component 
analysis to derive monochromatic image data. 

33. A computer program according to Claim 23 or 32 characterised in that the program 
instructions provide for identification of image regions potentially corresponding to 
cells in histological image data by a procedure which includes dividing the image 
data into overlapping sub-images, removing from the sub-images: 

a) image regions touching or intersecting sub-image boundaries, 

b) unsuitably small image regions, and 

c) holes in relatively large image regions, and 
reassembling the sub-images. 




ABSTRACT 



A method of histological assessment of pleomorphism identifies potential cells using 
principal component analysis to derive monochromatic image data, dividing the image 
data, into overlapping sub-images, removing image regions at sub-image boundaries, 
unsuitably small image regions and holes in relatively large image regions, and 
reassembling the sub-images. Perimeters (P) and areas (A) of potential cells are 
determined and used in calculating cell shape factors P 2 /A. Pleomorphism is assessed as 
relatively low, moderate or high according to whether predetermined thresholds indicate a 
mean cell shape factor is relatively low, moderate or high. 



Figure 2 should accompany the Abstract 
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